Hierarchical Agglomerative Clustering for Cross-Language Information Retrieval

نویسندگان

  • RAYNER ALFRED
  • ELENA PASKALEVA
  • MARK BARTLETT
چکیده

In this article, we report on our work on applying hierarchical agglomerative clustering (HAC) to a large corpus of documents where each appears both in Bulgarian and English. We cluster these documents for each language and compare the results both with respect to the shape of the tree and content of clusters produced. Clustering multilingual corpora provides us with an insight into the differences between languages when term frequency-based information retrieval (IR) tools are used. It also allows one to use the natural language processing (NLP) and IR tools in one language to implement IR for another language. For instance, in this way, the most relevant articles to be translated from language X to language Y can be selected after studying the clusters of abstracts in language Y.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis

Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This pap...

متن کامل

Clustering of Web Search Results Using Semantic

Clustering is related to data mining for information retrieval. Relevant information is retrieved quickly while doing the clustering of documents. It organizes the documents into groups; each group contains the documents of similar type content. Different clustering algorithms are used for clustering the documents such as partitioned clustering (K-means Clustering) and Hierarchical Clustering (...

متن کامل

Hierarchical Clustering in Medical Document Collections: the BIC-Means Method

Hierarchical clustering of text collections is a key problem in document management and retrieval. In partitional hierarchical clustering, which is more efficient than its agglomerative counterpart, the entire collection is split into clusters and the individual clusters are further split until a heuristically-motivated termination criterion is met. In this paper, we define the BIC-means algori...

متن کامل

Feature Location in a Collection of Product Variants: Combining Information Retrieval and Hierarchical Clustering

Locating source code elements relevant to a given feature is an important step in the process of re-engineering software variants, developed by an ad-hoc reuse technique, into a Software Product Line (SPL) for systematic reuse. Existing works on using Information Retrieval (IR) techniques do not consider the abstraction gap between feature and source code levels. In our recent work, we have imp...

متن کامل

An Experimental Study on Content Based Image Retrieval Based On Number of Clusters Using Hierarchical Clustering Algorithm

Nowadays the content based image retrieval (CBIR) is becoming a source of exact and fast retrieval. CBIR presents challenges in indexing, accessing of image data and how end systems are evaluated. Data clustering is an unsupervised method for extraction hidden pattern from huge data sets. Many clustering and segmentation algorithms both suffer from the limitation of the number of clusters speci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011